Construction of Mizo – English Parallel Corpus for Machine Translation
نویسندگان
چکیده
Parallel corpus is a key component of statistical and Neural Machine Translation (NMT). While most research focuses on machine translation, creation studies are limited for many languages no paper Mizo–English exists yet. A high-quality parallel required Natural Language Processing (NLP) activities including Chatbots, Transliteration, Cross-Language Information Retrieval. This work aims to investigate techniques apply them the Mizo-English language pair. Another goal test translation newly constructed corpus. We contributed LF Aligner tool support Mizo sentence alignment in development. Our effort created first large-scale with over 529K sentences. The pre-processed was used Mizo-to-English NMT. It evaluated using BLEU, ChrF, TER scores. system achieved BLEU 45.08, ChrF 65.36, 41.16, setting new benchmark translation.
منابع مشابه
Catalan-English statistical machine translation without a parallel corpus
This paper presents a full experiment on large-vocabulary Catalan-English statistical machine translation without an English-Catalan parallel corpus, in the context of the debates of the European Parliament. For this, we make use of an English-Spanish European Parliament Proceedings parallel corpus and a Spanish-Catalan general newspaper parallel corpus, both of which of more than 30 M words. G...
متن کاملUM-Corpus: A Large English-Chinese Parallel Corpus for Statistical Machine Translation
Parallel corpus is a valuable resource for cross-language information retrieval and data-driven natural language processing systems, especially for Statistical Machine Translation (SMT). However, most existing parallel corpora to Chinese are subject to in-house use, while others are domain specific and limited in size. To a certain degree, this limits the SMT research. This paper describes the ...
متن کاملAutomatic Construction of Translation Knowledge for Corpus-based Machine Translation
Many machine translation (MT) systems that utilize the knowledge automatically acquired from bilingual corpora have been proposed in conjunction with efforts to accumulate corpora. We call this approach corpus-based machine translation in this thesis. This thesis focuses on automatic construction of the translation knowledge needed for corpus-based MT and discusses the following three tasks. 1....
متن کاملa corpus-hased study of units of translation in english-persian literary translation
چکیده ندارد.
15 صفحه اولHindEnCorp - Hindi-English and Hindi-only Corpus for Machine Translation
∗Charles University in Prague, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics [email protected] †Charles University in Prague, Faculty of Arts, Department of Linguistics [email protected] ‡Natural Language Processing Centre, Faculty of Informatics, Masaryk University [email protected], [email protected] Abstract We present HindEnCorp, a parallel corp...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: ACM Transactions on Asian and Low-Resource Language Information Processing
سال: 2023
ISSN: ['2375-4699', '2375-4702']
DOI: https://doi.org/10.1145/3610404